Morphological Ending – based Strategies of Unknown Word Estimation for Statistical POS Urdu Tagger
نویسندگان
چکیده
Natural language processing has widely used Statistical based language models to solve disambiguation problems. Over the past decades different techniques regarding POS tagging have been proposed for English, European and East Asian languages. In this paper our focus is POS tagging for Urdu due to the infancy stage of Urdu language based tagging system. We have combined two approaches (Statistical and morphological ending based technique) to assign the appropriate syntactic categories, using Urdu language corpus as our experimental data. The process of our tagging system consists of two different stages. In first stage, we apply statistical based word level language model to compute the probability and assign the appropriate tag to the word. In second stage, we extract all the ambiguous and unknown words in the corpus and apply morphological ending based rules to resolve these discontinuities, which arise if statistical model fails to assign the appropriate tag to a given word. The development of this tagger is an initial step toward the Urdu POS tagging. The experimental results of the tagger show that the performance of the unknown word is improved when we add morphological ending based features with statistical model. Evaluation method employed shows the significance of experimental results and the effectiveness of morphological ending on statistical method. This is a pioneering work towards building a POS tagger for Urdu language through Morphological ending based strategies
منابع مشابه
Syllable-Pattern-Based Unknown-Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean
Most errors in Korean morphological analysis and part-of-speech (POS) tagging are caused by unknown morphemes. This paper presents a syllable-pattern-based generalized unknownmorpheme-estimation method with POSTAG (POStech TAGger), which is a statistical and rule-based hybrid POS tagging system. This method of guessing unknown morphemes is based on a combination of a morpheme pattern dictionary...
متن کاملTagger Voting for Urdu
In this paper, we focus on improving part-of-speech (POS) tagging for Urdu by using existing tools and data for the language. In our experiments, we use Humayoun’s morphological analyzer, the POS tagging module of an Urdu Shallow Parser and our own SVM Tool tagger trained on CRULP manually annotated data. We convert the output of the taggers to a common format and more importantly unify their t...
متن کاملAutomated part - of - speech analysis of Urdu : conceptual and technical issues
Part-of-speech (POS) tagging is the process of labelling tokens in a text with tags that indicate their morphosyntactic category, and has a wide range of applications in computational and corpus linguistics, such as the production of corpus-based dictionaries and grammars. This paper describes an experiment in extending POS tagging to a hitherto untagged language, Urdu. The most challenging tas...
متن کاملMaximum Entropy Based Bengali Part of Speech Tagging
Part of Speech (POS) tagging can be described as a task of doing automatic annotation of syntactic categories for each word in a text document. This paper presents a POS tagger for Bengali using the statistical Maximum Entropy (ME) model. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the various POS cl...
متن کاملA Hybrid Morphology-Based POS Tagger for Persian
In many applications of natural language processing (NLP) grammatically tagged corpora are needed. Thus Part of Speech (POS) Tagging is of high importance in the domain of NLP. Many taggers are designed with different approaches to reach high performance and accuracy. These taggers usually deal with inter-word relations and they make use of lexicons. In this paper we present a new tagging algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007